1,573 research outputs found

    Web-Based Emergent Manuscript Transcriptions

    Get PDF
    This project, part of the larger Emergent Transcriptions initiative, designed and implemented tools which will assist archives in digitally storing historical manuscripts and their transcriptions, and enables searching of their collections. Our contributions to the initiative included optimizing existing visual processing systems using a genetic algorithm, and building a database and web front-end for the system, to facilitate searching and editing of archival collections. We also redesigned core features using good Software Engineering principles to make future system additions easier

    Weather and Climate Information for Tourism

    Get PDF
    The tourism sector is one of the largest and fastest growing global industries and is a significant contributor to national and local economies around the world. The interface between climate and tourism is multifaceted and complex, as climate represents both a vital resource to be exploited and an important limiting factor that poses risks to be managed by the tourism industry and tourists alike. All tourism destinations and operators are climate-sensitive to a degree and climate is a key influence on travel planning and the travel experience. This chapter provides a synopsis of the capacities and needs for climate services in the tourism sector, including current and emerging applications of climate services by diverse tourism end-users, and a discussion of key knowledge gaps, research and capacity-building needs and partnerships that are required to accelerate the application of climate information to manage risks to climate variability and facilitate successful adaptation to climate change

    MalNet: A Large-Scale Cybersecurity Image Database of Malicious Software

    Full text link
    Computer vision is playing an increasingly important role in automated malware detection with to the rise of the image-based binary representation. These binary images are fast to generate, require no feature engineering, and are resilient to popular obfuscation methods. Significant research has been conducted in this area, however, it has been restricted to small-scale or private datasets that only a few industry labs and research teams have access to. This lack of availability hinders examination of existing work, development of new research, and dissemination of ideas. We introduce MalNet, the largest publicly available cybersecurity image database, offering 133x more images and 27x more classes than the only other public binary-image database. MalNet contains over 1.2 million images across a hierarchy of 47 types and 696 families. We provide extensive analysis of MalNet, discussing its properties and provenance. The scale and diversity of MalNet unlocks new and exciting cybersecurity opportunities to the computer vision community--enabling discoveries and research directions that were previously not possible. The database is publicly available at www.mal-net.org

    Developing Robust Models, Algorithms, Databases and Tools With Applications to Cybersecurity and Healthcare

    Get PDF
    As society and technology becomes increasingly interconnected, so does the threat landscape. Once isolated threats now pose serious concerns to highly interdependent systems, highlighting the fundamental need for robust machine learning. This dissertation contributes novel tools, algorithms, databases, and models—through the lens of robust machine learning—in a research effort to solve large-scale societal problems affecting millions of people in the areas of cybersecurity and healthcare. (1) Tools: We develop TIGER, the first comprehensive graph robustness toolbox; and our ROBUSTNESS SURVEY identifies critical yet missing areas of graph robustness research. (2) Algorithms: Our survey and toolbox reveal existing work has overlooked lateral attacks on computer authentication networks. We develop D2M, the first algorithmic framework to quantify and mitigate network vulnerability to lateral attacks by modeling lateral attack movement from a graph theoretic perspective. (3) Databases: To prevent lateral attacks altogether, we develop MALNET-GRAPH, the world’s largest cybersecurity graph database—containing over 1.2M graphs across 696 classes—and show the first large-scale results demonstrating the effectiveness of malware detection through a graph medium. We extend MALNET-GRAPH by constructing the largest binary-image cybersecurity database—containing 1.2M images, 133×more images than the only other public database—enabling new discoveries in malware detection and classification research restricted to a few industry labs (MALNET-IMAGE). (4) Models: To protect systems from adversarial attacks, we develop UNMASK, the first model that flags semantic incoherence in computer vision systems, which detects up to 96.75% of attacks, and defends the model by correctly classifying up to 93% of attacks. Inspired by UNMASK’s ability to protect computer visions systems from adversarial attack, we develop REST, which creates noise robust models through a novel combination of adversarial training, spectral regularization, and sparsity regularization. In the presence of noise, our method improves state-of-the-art sleep stage scoring by 71%—allowing us to diagnose sleep disorders earlier on and in the home environment—while using 19× less parameters and 15×less MFLOPS. Our work has made significant impact to industry and society: the UNMASK framework laid the foundation for a multi-million dollar DARPA GARD award; the TIGER toolbox for graph robustness analysis is a part of the Nvidia Data Science Teaching Kit, available to educators around the world; we released MALNET, the world’s largest graph classification database with 1.2M graphs; and the D2M framework has had major impact to Microsoft products, inspiring changes to the product’s approach to lateral attack detection.Ph.D
    • …
    corecore